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Abstract 

We present a thermodynamically robust coarse-grained model to simulate folding of RNA 
in monovalent salt solutions. The model includes stacking, hydrogen bond and electrostatic 
interactions as fundamental components in describing the stability of RNA structures. The 
stacking interactions are parametrized using a set of nucleotide-specific parameters, which 
were calibrated against the thermodynamic measurements for single-base stacks and base-pair 
stacks. All hydrogen bonds are assumed to have the same strength, regardless of their context 
in the RNA structure. The ionic buffer is modeled implicitly, using the concept of counterion 
condensation and the Debye-HUckel theory. The three adjustable parameters in the model 
were determined by fitting the experimental data for two RNA hairpins and a pseudoknot. 
A single set of parameters provides good agreement with thermodynamic data for the three 
RNA molecules over a wide range of temperatures and salt concentrations. In the process 
of calibrating the model, we establish the extent of counterion condensation onto the single- 
stranded RNA backbone. The reduced backbone charge is independent of the ionic strength 
and is 60% of the RNA bare charge at 37 °C. Our model can be used to predict the folding 
thermodynamics for any RNA molecule in the presence of monovalent ions. 
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Introduction 

Since the landmark discovery that RNA molecules can act as enzymes/ an increasing repertoire 
of cellular functions has been associated with RNA, raising the need to understand how these com- 
plex molecules fold into elaborate tertiary structures. In response to this challenge, great strides 
have been made in describing RNA folding.-""i^ Single molecule and ensemble experiments using 
a variety of biophysical methods, combined with theoretical techniques, have led to a conceptual 
framework for interpreting the thermodynamics and kinetics of RNA folding.-^— Despite these 
advances, there are very few reliable structural models with the ability to quantitatively predict the 
thermodynamic properties of RNA (see, however, refs. 11-17). The development of simple and 
accurate models is complicated by the interplay of several energy and length scales, which arise 
from stacking, hydrogen bond and electrostatic interactions. Although multiple interactions con- 
tribute to the stability of RNA, the most vexing of these are the electrostatic interactions, since the 
negatively charged phosphate groups make RNA a strongly charged poly electrolyte. ^^ Because of 
the strong intramolecular Coulomb repulsion, the magnitude of the charge on the phosphate groups 
has to be reduced in order for RNA molecules to fold. The softening of repulsion between the phos- 
phate groups requires the presence of counterions. A number of factors such as the Debye length, 
the Bjerrum length, the number of nucleotides in RNA, as well as the size, valence and shape 
of counterions— modulate electrostatic interactions, which further complicates the prediction of 
RNA folding thermodynamics. 

hi principle, all-atom simulations of RNA in water provide a straightforward route to com- 
puting RNA folding thermodynamics. However, uncertainties in nucleic acid force fields and the 
difficulty in obtaining adequate conformational sampling have prevented routine use of all-atom 
simulations to study the folding of even small RNA molecules. At the same time, the success 
of using polyelectrolyte theories— and simulations^^ in capturing many salient features of RNA 
folding justifies the development of coarse-grained (CG) models. None of the existing CG models 
of RNA, which have been remarkably successful in a variety of applications,—""— have been used 
to reproduce folding thermodynamics over a wide range of ion concentrations and temperature. In 



this paper, we introduce a force field based on a CG model in which each nucleotide is represented 
by three interactions sites (TIS) — a phosphate, a sugar and a base.— The TIS force field includes 
stacking, hydrogen bond and electrostatic interactions that are known to contribute significantly 
to the stability of RNA structures. We obtain the thermodynamic parameters for the stacking and 
hydrogen bond interactions by matching the simulation and experimental melting data for various 
nucleotide dimers and for the pseudoknot from mouse mammary tumor virus (MMTV PK in Fig- 
ure [T]). Our description of the electrostatic interactions in RNA relies on the concept of counterion 
condensation, which posits that counterions condense onto the sugar-phosphate backbone and par- 
tially reduce the charge on each phosphate group. Our simulations provide a way to determine the 
magnitude of the reduced backbone charge by fitting the experimental data for the ion-dependent 
stability of RNA hairpins (L8 and LIO in Figured])- Remarkably, experimental data on folding 
thermodynamics of the MMTV PK, L8, and LIO are reproduced well over a wide range of temper- 
atures and concentrations of monovalent salt using a single set of force field parameters. Our CG 
force field is transferable, and hence can be adopted for other RNA molecules as well. 

Methods 

Three Interaction Site (TIS) Representation of RNA 

In the TIS model, each nucleotide is replaced by three spherical beads P, S and B, representing a 
phosphate, a sugar and a base (Figure |2]). The coarse-grained beads are at the center of mass of the 
chemical groups. The energy function in the TIS model, f/jis^ has the following six components, 

UjlS = t/BL + f/BA + t/EV + t/sT + UuB + UeL, (1) 

which correspond to bond length and angle constraints, excluded volume repulsions, single strand 
base stacking, inter-strand hydrogen bonding and electrostatic interactions. We constrain bond 
lengths, p, and angles, a, by harmonic potentials, t/BL(p) = kp{p — po)^ and Uba{(x) = ka{cc — 



OCo)^, where the equilibrium values po and OCq are obtained by coarse-graining an ideal A-form 
RNA helix.— The values of kp, in units of kcalmol^^A^^, are: 64 for an S— )-P bond, 23 for an 
P— T-S bond ("—)•" indicates the downstream direction) and 10 for an S— B bond. The values of ^a 
are 5 kcalmol^^rad^^ if the angle involves a base, and 20 kcalmol^^rad^^ otherwise. 

Excluded volume between the interacting sites is modeled by a Weeks-Chandler- Andersen 
(WCA) potential. 



UEvir) = Co 



tTAt^^^' 



r<n 



0, 



[/Ev(r) = 0, r > Do, (2) 

which has been commonly used to study excluded volume effects in fluids.— The precise form of 
UEwir) will not affect the results as long as UEv{r) is short-ranged. The WCA potential is com- 
putationally efficient because it vanishes exactly beyond the contact distance Dq. To allow close 
approach between two bases that stack flat one on top of another, we assume Dq = 3.2 A and 
£0 = 1 kcal/mol for the interacting sites representing bases. With the exception of stacked bases, 
this choice of parameters underestimates the distance of closest approach between coarse-grained 
RNA groups. However, to keep the parameterization of the model as simple as possible, we use 
the same Dq and e for all interacting sites. We note that the specific choice of parameters in Eq. (|2]) 
has little effect on the results obtained. In our simulations, stable folds are sustained by stack- 
ing and hydrogen bond interactions, Usj and Uub, which are parameterized using experimental 
thermodynamic data and accurate approach distances between various RNA groups (see below). 

Stacking Interactions 

Single strand stacking interactions, Ust, are applied to any two consecutive nucleotides along the 

chain, 

Ust = — , (3) 

1 + kr{r - ro)2 + k^ (0i - ^i^o)^ + k^ (^2 - 02.o)^ ' 



where r, 0i and ^2 are defined in Figure |2l Sixteen distinct nucleotide dimers are modeled with 
different ro, ^i,Oj ^2,0 and f/gj. The structural parameters rg, ^1,0 and 02,0 are obtained by coarse- 
graining an A-form RNA helix. ^^ To estimate standard deviations of r and 0i, ^2 from the corre- 
sponding values in a A-form helix, we used double helices in the NMR structure of the pseudoknot 
from human telomerase RNA— (PDB code 2K96). We chose this pseudoknot because it has two 
fairly long stems containing six and nine Watson-Crick base pairs. We had previously conducted 
simulations of the two stems at 15 °C in the limit of high ionic strength ^^ and found that, for 
kr = lA A^^ and k,/, =4 rad^^, the time averages of (r — rg)^, (0i — 0i,o)^ and {^2 — ^2,0)^ agreed 
well with the standard deviations computed from the NMR structure. The time averages were not 
very sensitive to a specific choice of U^j. Using kr = 1.4 A^^ and k^ = A rad^^, we derive f/g^p 
from available thermodynamic measurements of single-stranded and double-stranded RNA,— ""— 
as described below. 

Thermodynamic parameters of dimers from experiments 

In the nearest neighbor model of RNA duplexes, the total stability of a duplex is given by a sum of 
successive contributions AG (^, j^^) , where x — y denotes a base pair stacked over the preceding base 
pair w — z. The enthalpy. A//, and entropy, A5, components of AG (^3^2) ^^ known experimentally 
at 1 M salt concentration.— Here, we make the following assumptions: 

M1\^~J^\ = mr\+Mir\+Q.5m{w-z)+Q.5Ml{x-y), 

where A//(^) and A5(^) are the thermodynamic parameters associated with stacking of x over w 
along 5' — 7- 3' in one strand. Additional enthalpy gain Mi{w — z) arises from hydrogen bonding 
between w and z in complementary strands. Our goal is to solve Eqs. (H) for A//('^), A5(^,) and 
A//(w — z). Since the number of unknowns exceeds the number of equations, we have to make 
some additional assumptions. 



We average the thermodynamic parameters on the left-hand side of Eqs. (Hj) for stacks (^ly) 
and (uIa)' (c-g) ^^'^ (g-c)' ^^'^ (c-g) ^^'^ (g-c)' because these values are similar within 
experimental uncertainty.'^ This allows us to assign A//(^) = AH(^) and A5(^) = AS('J^) on the 
right-hand side of Eqs. (H) for all dimers, except for (q) and (^). Additional simplifications result 
from the analysis of experimental data on stacking of nucleotide dimers.— "^^ Experiments indicate 
that dimers (^) , (^) and (^) have similar stacking propensities and can therefore be described by 
one set of thermodynamic parameters. The same holds for (^) and ((-.) . 

The melting temperature of dimer (^) is known from experiment, T^ = 26 °C.— According 
to the assumptions above, dimer (^) has the same melting temperature. Combining Eqs. dD for 
(aIu) ^^'^ *^ relationship AH{^) = kBTrnAS{^), where 7^ = 299 K and ^b is the Boltzmann 
constant, we can solve for Ai/(^), A5(^) and AH{A-U). By assigning AH[^ = AH(^) and 
A5(^) = A5(^) in Eqs. ® for (^lu), we solve for A//(JJ) and ASQ. Finally, we assume 

^(c)='^^(a)+('-*>^(u)- 

where ^ is a constant. This assumption is based on the observation that the measured enthalpy 
changes of duplex formation, AH{^^) in Eqs. dH), are approximately in proportion to the corre- 
sponding entropy changes, A5(^3^^'). Furthermore, from previous assumptions, the melting tem- 
perature of dimer ((^) should match the T^ of (^) , which is known experimentally to be 13 °C.— 
Using this result and Eqs. (|5]), we obtain A//((^) and A5((-.) . 

The enthalpies of hydrogen bond formation between Watson-Crick base pairs are related as 
AH{G - C) = 3/2A//(A - U), where A//(A - U) = - 1 .47 kcal/mol is the resuh of the calculation 
outlined above. The remaining thermodynamic parameters follow directly from Eqs. dU with no 
further approximations. The results are summarized in Table [T] The relative stacking propensities 
of dimers in Table [T] are consistent with experiments. -^^i^l 



Thermodynamic parameters of dimers from simulations 

To calibrate the model, we simulated stacking of coarse-grained dimers similar to that shown in 
Figure|2l We used the stacking potential t/sT in Eq. © with U^j = —h + k^{T — Tjn)s, where T (K) 
is the temperature, Tj^ (K) is the melting temperature given in Table [U and h and s are adjustable 
parameters. In simulations, we computed the stability AG of stacked dimers at temperature T using 

AG=-%rinp + %rin(l-p)+AGo, (6) 

where p is the fraction of all sampled configurations for which Ugj < —k^T (Figure |3]). The 
correction AGq in Eq. Q is assumed to be constant for all dimers and accounts for any differences 
in the definition of AG between experiments and simulations. 

Figure |3] shows the simulation values of AG for the dimer (^) , as a function of T. At AGq = 
and s — 0, the melting temperature Tm of (^), computed using AG(rni) = 0, increases with h and 
equals Tm in Table [H when h = 5.98 kcal/mol. If i' = 0, the entropy loss of stacking, given by 
the slope of AG(r) over T, is smaller than the value of AS specified in Table [T] To rectify this 
discrepancy we take Ugj = —5.98 + ^3(7' — Tm)s with s > 0, which does not alter Tm but allows us 
to adjust the slope of AG(r) by adjusting the value of s. We find that s — 5.30 is consistent with 
A5of (3 in Table [B 

We carried out the same fitting procedure for all coarse-grained dimers. The resulting param- 
eters C/g-p are summarized in Table |2]for AGq = 0.6 kcal/mol (AGq ~ k^T at room temperature). 
This value of AGq gives the best agreement between simulation and experiment (see also Results 
and Discussion). Note that, although some stacks have equivalent thermodynamic parameters in 
Table [U they have somewhat different f/g-j, due to their geometrical differences. 

Finally, the parameters Ugj in Tableware coupled to the specific choice of kr and k(j, in Eq. ([3]), 
since these coefficients determine how much entropy is lost upon formation of a model stack. For 
any reasonable choice of kr and fc^, the coarse-grained simulation model without explicit solvent 
will require correction factors s to match the experimental AS. If different values of kr and k^ 
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are chosen, the accuracy of the model will not be compromised as long as U^j (h and s) are also 
readjusted following the fitting procedure outlined above. 

Hydrogen Bond Interactions 

To model the RNA structures shown in Figure [U we use coarse-grained hydrogen bond interac- 
tions f/HB which mimic the atomistic hydrogen bonds present in the folded structure. The atom- 
istic structures of hairpins L8 and LIO have not been determined experimentally. We assume that 
the only hydrogen bonds stabilizing these hairpins come from six Watson-Crick base pairs in the 
hairpin stem. The NMR structure for the MMTV PK is available (PDB code IRNK^^). For the 
MMTV PK, we generated an optimal network of hydrogen bonds by submitting the NMR structure 
to the WHAT IF server at |http : //swift . cmbi . ru . nl ,. Each hydrogen bond is modeled by 
a coarse-grained interaction potential, 

t/HB = t/HBx[l+5(r-ro)2+ 1.5(01-01,0)^+ 1.5(02-02,0)' 

+0.15(v/-V/o)' + 0.15(v/i-V/i,o)2 + 0.15(v/2-V^2,o)']"\ (7) 

where r, 0i, 02, V^, i/Zi and i//2 are defined in Figure |4]for different coarse-grained sites. For Watson- 
Crick base pairs, the equilibrium values ro, 01.0, 02,0, V^o, V^i,o and V/2,0 are adopted from the coarse- 
grained structure of an ideal A-form RNA helix.— For all other bonds, the equilibrium parameters 
are obtained by coarse-graining the PDB structure of the RNA molecule. Our approach assumes 
that an A-form helix is an equilibrium state for RNA canonical secondary structure. Modeling of 
non-canonical base pairing and of the tertiary interactions is biased to the native structure. The 
coefficients 5, 1.5 and 0.15 in Eq. (jT]) were determined from the same simulations as kr and ktp 
in Eq. ([3]). Equation (|7]) specifies Uub for a single hydrogen bond and it must be multiplied by 
a factor of 2 or 3 if the same coarse-grained sites are connected by multiple bonds (as in base 
pairing). The geometry of Uub in Eq. (7} is the minimum necessary to maintain stable helices in 
the coarse-grained model. In particular, simulations of the MMTV PK (Figure [B at 10 °C yield 



the RMS deviation from the NMR structure of 1 .4 and 2.0 A for stems 1 and 2, respectively. 

In the present implementation of the model, the only hydrogen bonds included in simulation 
are those that are found in the PDB structure of the RNA molecule. However, large RNA molecules 
may have alternative patterns of secondary structure that are sufficiently stable to compete with the 
native fold. To account for this possibility, we have developed an extended version of the model 
where we allow the formation of any G— C, A— U or G— U base pair. Although easily implemented, 
this additional feature makes simulations significantly less efficient due to a large number of base 
pairing possibilities. A description of the extended model and its implementation for large RNA 
will be reported separately. For small RNA molecules, similar to the ones considered here, we find 
that the folding thermodynamics is largely unaffected by the inclusion of altemative base pairing. 

Electrostatic interactions 

To model electrostatic interactions, we employ the Debye-Hiickel approximation combined with 
the concept of counterion condensation,— which has been used previously to determine the re- 
duced charge on the phosphate groups in RNA.^^ The highly negatively charged RNA attracts 
counterions, which condense onto the sugar-phosphate backbone. The loss in translational entropy 
of a bound ion (in the case of spherical counterions) is compensated by an effective binding energy 
between the ion and RNA, thus making counterion condensation favorable. Upon condensation 
of counterions onto the RNA molecule, the charge of each phosphate group decreases from —e to 
—Qe, where Q<\ and e is the proton charge. The uncondensed mobile ions are described by the 
linearized Poisson-Boltzmann (or Debye-Hiickel) equation. It can be shown that the electrostatic 
free energy of this system is given by— 

where |r, — Vj\ is the distance between two phosphates i and j, £ is the dielectric constant of water 
and A is the Debye-Hiickel screening length. The value of the Debye length A must be calculated 
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individually for each buffer solution using 

^"'^-^E^'pn, (9) 

where qn is the charge of an ion of type n and p„ is its number density in the solution. If 
evaluated in units of A^, the number density p is related to the molar concentration c through 
p = 6.022 X lO^^c. In the simulation model, the free energy Gdh is viewed as the effective energy 
of electrostatic interactions between RNA phosphates, [/el = Gdh, and as such it contributes an 
extra term to the energy function in Eq. ([1). This implicit inclusion of the ionic buffer significantly 
speeds up simulations, leading to much enhanced sampling of RNA conformations. 

To complete our description of [/el, we still need to define the magnitude of the phosphate 
charge Q. For rod-like polyelectrolytes in monovalent salt solutions. Manning's theory of counte- 
rion condensation predictS'2^ 

where b is the length per unit (bare) charge in the polyelectrolyte and Zb is the Bjerrum length, 

h = -^-^. (11) 

According to Eq. (flOl ), the reduced charge 2 (2 = 1 in the absence of counterion condensation) 
does not depend on the concentration c of monovalent salt. The dependence of Q on T is nonlinear, 
since the dielectric constant of water decreases with the temperature,— 



e(r) = 87.740 -0.4008r + 9.398 X lO^^r^- 1.410 x 10"'^^^ (12) 



where T is in °C. 

We estimate b in Eq. (flOl) from available folding data for hairpins L8 and LI 0, which were mea- 
sured extensively in monovalent salt solutions of different ionic strength.— We find that b = 4.4 
A reproduces measured stabilities of these hairpins over a wide range of salt concentrations. As- 
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suming b = 4.4 A for any RNA in a monovalent salt solution, we obtain good agreement between 
simulation and experiment for the MMTV PK (Figure [I])- We propose that Eq. ([8]) is sufficient to 
describe salt dependencies of RNA structural elements such as double helices, loops and pseudo- 
knots. 

In our coarse-grained simulation model, individual charges are placed at the centers of mass 
of the phosphate groups (sites P). This can be compared to an atomistic representation of the 
phosphate group, where the negative charge is concentrated on the two oxygen atoms. A more 
detailed distribution of the phosphate charge is not expected to have a significant effect on the 
electrostatic interactions between different strands in an RNA molecule. For instance, the distance 
between two closest phosphate groups on the opposite strands of a double helix is approximately 1 8 
A, as compared to the distance between two atoms in a phosphate group of about 1 A. Furthermore, 
when considering a single strand, the dominant effect of the backbone charge distribution will be 
to modulate the magnitude of the reduced charge Q. If the density of the bare backbone charge 
is slightly underestimated then Eq. (flOl) will predict a larger value of Q. Therefore, fitting Q 
to experimental data allows us to compensate for small scale variations in the backbone charge 
density. 

Calculation of Stabilities 

We are interested in calculating the stability AG of the RNA structures shown in Figure [I] as a 
function of temperature T. However, the folded and unfolded states of RNA coexist only in a 
narrow range of T around the melting temperature. Thus, computing AG by means of direct 
sampling of the folding/unfolding transition at any T is not feasible. Below we derive a formula for 
AG(r) from fundamental thermodynamic relationships that enables us to circumvent this problem. 
Consider the Gibbs free energy of the folded state, Gf = Hf— TSf. We can write the following 
exact expressions for the enthalpy Hf and entropy 5f, 

dHf 



H,{T)=H,{T*)+l^^^dT 
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S,{T)=SfiT*)+J^^-^dT, (13) 

where T* is an arbitrary reference temperature. The derivatives in Eq. (fT3l) can be expressed in 

terms of the heat capacity Cf , 

dHf dSf 
Cf^^ = T^, (14) 

so that 

Gf{T)=Hf{T*)-TSf{T*)+ QdT-T -^dT. (15) 

If we assume that the heat capacity of the folded state does not change significantly over the 
temperature range of interest, Eq. (flST l simplifies to 



(16) 



Gf(r)=//f(r*)-rSf(r*)-Cf (r*-r + rin— J 



According to Eq. (fT6l) . we can deduce the free energy Gf of the folded state at temperature T from 
the thermodynamic properties at some other temperature T*. The same result holds for the free 
energy Gu{T) of the unfolded state. 

In the analysis of two-state transitions, it is convenient to use the transition (melting) temper- 
ature Tja as the reference temperature for both folded and unfolded states. Then, the free energy 
difference AG between the folded and unfolded states is given by 

AG(r)=A//(rn,)('i-^VAc('rn,-r-frin^y (iv) 

where we have used AG(rm) = 0. Equation (flTl) is commonly used to determine RNA stabil- 
ity from calorimetry experiments,'^^ since it expresses AG(r) in terms of measured changes in 
enthalpy and heat capacity. 

In simulations, we calculate the stability AG(r) of the folded RNA as follows. For each RNA 
illustrated in Figure [B we run a series of Langevin dynamics simulations at different tempera- 
tures T in the range from to 130 °C. Using the weighted histogram technique, we combine the 
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simulation data from all T to obtain the density of energy states, p{E), which is independent of 
temperature. The total free energy of the system, G{T), is then given by 

G{T) = -k^T In J p{E)exp (-^) dE, (18) 

where the integral, representing the partition function, runs over all energy states. At low T, the 
partition function in Eq. (fTST ) is dominated by the folded conformations and therefore, G{T) ^ 
Gf(r). This allows us to rewrite Eq. (fT6] ) as 

Gf(r) = G(r) + |^(r)(r-r)+r|^(r)('r-r + nn|^Y (i9) 

where we take T* =0°C to be the reference temperature for the folded state. To obtain Eq. (fT9l) . 
we used S = —dG/dT and C = —Td^G/dT^ . Equation (fT9l ) can also be used to compute the 
free energy of the unfolded state, Gu(r), if the reference temperature T* is chosen such that 
G{T*) ^ Gu(r*) — for example, T* = 130 °C. The stability AG{T) of the folded RNA is given 
by a difference between Gf{T) and Gu{T), as illustrated in Figured 

The present calculation is an alternative to the commonly used order parameter method to 
determine AG and can be applied to any folding/unfolding transition without further adjustments. 
Furthermore, in contrast to Eq. (fTTl . our approach will still work in systems that do not exhibit a 
two-state behavior, since it employs different reference temperatures for the folded and unfolded 
states. The only assumption is that at the reference temperature T* for the folded (unfolded) state 
the population of the unfolded (folded) state is negligible. At the reference temperatures chosen in 
our simulation, °C and 130 °C, this assumption is trivially satisfied. 

The formalism described above, including the weighted histogram technique, assumes that 
the conformational energy E in Eq. (fTSi) does not depend on temperature. However, the stacking 
interactions in Eq. dD and electrostatic interactions in Eq. ([8]) have T as a parameter. The stacking 
parameters Ugj in Eq. ([3]) are linear in T, so we can write C/sT = uo + k^Tui, where uq and mi are 
temperature independent. The Boltzmann factor of the second term, exp(— mi), does not contain 
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T and cannot affect the temperature dependence of thermodynamic quantities. In the data analysis 
using weighted histograms, it is convenient to incorporate this Boltzmann factor into the density of 
states p{E). Effectively, this means that the stacking interactions u\ can be omitted from the total 
energy E in Eq. (fTSl) and from all ensuing formulas. The electrostatic interactions in Eq. ([8]) depend 
on T nonlinearly, since Q, £ and X are all functions of T. However, we operate within a relatively 
narrow range of temperatures — the thermal energy k^T is between 0.54 and 0.8 kcal/mol. This 
justifies expanding the electrostatic potential C/el up to the first order in T, which then enables us 
to treat it similarly to Ugj. We expand Uel around 55 °C, in the middle of the relevant temperature 
range. We have checked that this linear expansion does not affect the numerical results reported 
here. 

Langevin Dynamics Simulations 

The RNA dynamics are simulated by solving the Langevin equation, which for bead i is m/f/ = 
— "yj-r, H-F, +f/, where m, is the bead mass, Yi is the drag coefficient, F,- is the conservative force, and 
f/ is the Gaussian random force, (f/(?)fj(?')) =6k^Tyi5ij5{t — t'). The bead mass m,- is equal to the 
total molecular weight of the chemical group associated with a given bead. The drag coefficient 
Yi is given by the Stokes formula, ji = 6Tzr\Ri, where r\ is the viscosity of the medium and Ri 
is the bead radius. To enhance conformational sampling, ^*^ we take r\ = lO^^Pa-s, which equals 
approximately 1% of the viscosity of water. The values of Ri are 2 A for phosphates, 2.9 A for 
sugars, 2.8 A for adenines, 3 A for guanines and 2.7 A for cytosines and uracils. The Langevin 
equation is integrated using the leap-frog algorithm with a time step At = 2.5 fs. 

Results and Discussion 

There are three adjustable parameters in the model: the corrective constant AGq in Eq. Q, the 
strength of hydrogen bonds [/^g in Eq. ([7]), and the length b, which defines the reduced phosphate 
charge Q in Eq. (fTOl) . The absolute value of the correction AGq should be relatively small, i.e.. 
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|AGo| < 1 kcal/mol. If our approach is successful, various RNA structures will be characterized 
by similar values of AGq and tZ/jg- "^^^ physical meaning of the variable Q implies that Q < I. 
The precise value of Q may depend on the specific RNA structure as well as the buffer properties, 
since both could determine the extent of counterion condensation. However, we find that Q does 
not vary much for different monovalent salt buffers or different RNA. 

Calibration of AGq and f/^g 

The parameters AGq and f/^g were adjusted to match the differential scanning calorimetry melting 
curve (or heat capacity) of the MMTV PK^^ at 1 M Na+ (Figure |6]). The list of all hydrogen bonds 
in the MMTV PK structure is given in [3l The secondary structure of the MMTV PK comprises 
five base pairs in stem 1 and six base pairs in stem 2 (Figure [T}. The tertiary structure is limited to 
singular hydrogen bonds and is not stable in the absence of Mg^+ ions.— 

We find that in simulations with c = 1 M in Eq. ^, the thermodynamic properties obtained 
are not sensitive to the magnitude of the phosphate charge. In particular, the simulation model 
yields similar heat capacities for the bare phosphate charge, 2 = 1, or if we assume counterion 
condensation, Q = Q*{T). This is not unexpected, since the electrostatic interactions are screened 
at high salt concentration and do not contribute significantly to the RNA stability. Therefore, we 
can identify AGq and t/^g which are ^-independent. 

The measured heat capacity at c = 1 M is reproduced well in simulation with AGq = 0.6 
kcal/mol and C/^g = 2.43 kcal/mol (Figure |6]). The model correctly describes the overall shape 
of the melting curve, including two peaks that indicate the melting transitions of the two stems. 
Stem 1 in the MMTV PK is comprised entirely of G— C base pairs (Figure [I) and, despite being 
shorter than stem 2, melts at a higher temperature. Although the melting temperature of stem 2 is 
reproduced very accurately in simulation, the melting temperature of stem 1 is somewhat under- 
estimated, 89 °C instead of 95 °C. We speculate that the failure to precisely reproduce both peaks 
is due to inaccurate estimates of the stacking parameters C/gj at high temperatures. In addition to 
several approximations involved in the derivation of U^j, the experimental data that were used in 
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this derivation were obtained at 37 °C or below. The approximation that enthalpies and entropies 
are constant may not be accurate for large temperature extrapolations. It is therefore expected that 
the agreement between experiment and simulation will be compromised at high temperatures. 

In the rest of the paper we set AGq = 0.6 kcal/mol and C/^g = 2.43 kcal/mol. Since the mag- 
nitude of the backbone charge could not be determined at high salt concentration, we will analyse 
the measurements of hairpin stability that cover a wide range of c. In this analysis we assume that 
Q is given by Eq. (flOl) . with b constant. 



Determination of b 

To estimate the reduced phosphate charge Q, we have computed the stabilities AG(c) of hairpins 
L8 and LIO (Figure [B for different values of b in Eq. (flOl) . We use these hairpins as bench- 
marks because their folding enthalpies and entropies have been measured over a wide range of 
c, from 0.02 M to 1 M Na+. The experimental AG(c) of L8 and LIO increase linearly with Inc 
for c < 0. 1 1 M, but the extrapolation of this linear dependence to c > 0. 1 1 M does not yield the 
measured stabilities at 1 M salt (Figure |7). In addition, the measured stability of LIO at 1 M is 
disproportionately larger than that of L8. For these reasons, we use 0.11 M as a reference salt 
concentration cq, instead of usual 1 M, and compare simulation and experiment in terms of relative 
stabilities AAG(c) = AG(c) - AG(co). 

The simulation model reproduces correctly the linear dependence of AAG on Inc, 

M^Gic) = -kc\n— (20) 

CO 

for c < 0. 1 1 M. It also predicts an upward curvature of AAG(c) for c > 0. 1 1 M (Figure|7]). We find 
that b = A.A h yields the best fit between the simulation and experimental values of kc. Note that, 
although the stability of RNA hairpins decreases sharply with temperature, the salt dependence 
of AG is mostly insensitive to T (Figure |7]). The linear slope in Eq. (|20] ) does not change with 
temperature in experiment and simulation. 
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An uncertainty in the analysis of the L8 and LIO data comes from the 5'-pppG, which is subject 
to hydrolysis in solution. The total number of phosphates Np may vary from 20 to 22 in L8 and 
from 22 to 24 in LIO. Panels (c) and (d) in Figure |7]show AAG for the hairpins with charge —Qe, 
-2Qe or -3Qe at the 5'-end for t = 31°C and Q = 0.60 (b = 4.4 A). Apparently, the charge of a 
terminal nucleotide has a strong influence on the hairpin stability. For DNA duplexes, the value of 
kc was shown to increase linearly with the total number of phosphates in the duplex, 

kc = 0.051 Np. (21) 

This formula assumes implicitly that all phosphates contribute equally to the duplex stability. We 
find that for short RNA hairpins, such as L8 and LIO, the end effects are significantly greater than 
1/A^p. In Figure H k,. = 0.0A45Np for LB with Np = 22 and kc = 0.04S2Np for LIO with Np = 24. 
Note that the ratio kc/Np shifts towards its value in Eq. (1211 with increasing the hairpin length. 

Although the experimental scatter in Figure |7]can be attributed to partial hydrolysis of the 5'- 
pppG, it is hard to establish the precise contribution of this effect. Therefore, we fix Z? = 4.4 A, 
which was obtained assuming no hydrolysis of the 5'-pppG. 

At c = 0.11 M, the simulation model predicts T^^ = 69.8 °C for the melting temperature of 
LB and AG = —6.6 kcal/mol for its stability at 37 °C. The corresponding experimental values are 
T^ = 75.7 °C and AG = -7.4 kcal/mol at 37 °C. For LIO, we have r^ = 66.5 °C, AG = -6.1 
kcal/mol in simulation and T^ = 73.0 °C, AG = —6.6 kcal/mol in experiment. Both hairpins 
are found to be less stable in simulation than in experiment. Predictions of hairpin stability at 1 
M salt, using the nearest neighbor model with stacking parameters from ref. 29, underestimate 
the melting temperatures and stabilities of L8 and LIO by a comparable amount. This suggests 
that some additional structuring may occur in the loops of these hairpins, which is not taken into 
account in theoretical models. Although our simulations account for possible base stacking in 
the loops, we do not consider any hydrogen bonds other than the six Watson-Crick base pairs in 
the hairpin stem (Figure [l). It is possible that bases in the loops of L8 and LIO form additional 
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hydrogen bonds, since these loops are relatively large. 

Melting at low ionic concentration 

Figure [8] compares the experimental heat capacity of the MMTV PK-~ at 50 mM K+ to the result 
obtained in simulation with c = 0.05 M in Eq. Q. It is not obvious a priori that hairpins and pseu- 
doknots should have the same reduced charge Q. Pseudoknot structures consist of three aligned 
strands of RNA, rather than two, and the high density of negative charge would be expected to 
promote counterion condensation. Nonetheless we find that the heat capacity of the MMTV PK 
computed using b = 4.4 A, which was established for hairpins, matches the experiment well (Fig- 
ure [Sh). Adjusting a single parameter b was sufficient to position correctly both melting peaks, an 
indication that Eq. ([8]) is suitable for a description of salt effects on RNA pseudoknots. The model 
also captures the characteristic property of the MMTV PK, that is, stem 2 is more strongly affected 
by changes in c than stem 1 . In experiment,— the difference in the melting temperatures of the two 
stems increases from 22 °C at c = 1 M to 32 °C at c = 0.05 M, which is related to a significant loss 
of stability for stem 2 in the low salt buffer. Note that neglecting counterion condensation (2=1) 
overestimates electrostatic repulsions between phosphates, rendering both stems significantly less 
stable in simulation than in experiment (Figure |8^). In particular, the melting transition of stem 2 
shifts to 4 °C, in stark contrast to the experimental melting temperature of 48 °C. 

A considerable difference in the stability of the two stems in the MMTV PK is further illustrated 
in FigurelH where we plot the probability that each stem is folded as a function of c. At 37 °C, stem 
1 is stable for all salt concentrations in the typical experimental range, whereas stem 2 undergoes 
a folding transition upon increasing c, with the midpoint at approximately 30 mM (Figure |9^). 
At 80 °C, the folding transition of stem 1 falls within the experimental range of c (Figure |9j5). 
However, at such high temperatures, the population of the unfolded state is non-negligible for all 
salt concentrations and, in the case of stem 2, it exceeds 80%. 

In Figure m we have used two different criteria for folding of the stems. For the solid curves a 
stem is considered folded if at least five base pairs have formed and for the dashed curves a stem is 
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assumed to be folded if at least one base pair has formed. Although the curves in Figure |9] depend 
on the criteria for folding, the numerical differences are small, especially at 37 °C. This is because 
the transition state in the folding of each stem corresponds to the closing of a loop by a single base 
pair, after which the formation of subsequent base pairs is a highly cooperative process. At 80 
°C, individual base pairs have a high probability of opening and closing without affecting the loop 
region, which contributes to the quantitative differences between the two definitions of the folded 
state (Figure I9J5). 

Conclusions 

We have developed a general coarse-grained simulation model that reproduces the folding thermo- 
dynamics of RNA hairpins and pseudoknots with good accuracy. The model enables us to study 
the folding/unfolding transitions with computational efficiency, as a function of temperature and 
ionic strength of the buffer. It is interesting that simulations using a single choice of model pa- 
rameters, AGq = 0.6 kcal/mol, t/^g = 2.43 kcal/mol and b = 4.4 A, show detailed agreement with 
available experimental data for the three RNA molecules in monovalent salt buffers. Although we 
have established the success of the model with applications to a few RNA molecules, the method- 
ology is general and we expect that the proposed force field can be used to study RNA with even 
more complex structures. Applications of the model to other RNA molecules will be reported in a 
separate publication. 

On the basis of the good agreement between simulations and experiments we conclude that, for 
c < 0.2 M, the effects of monovalent salt on RNA stability can be attributed to the polyelectrolyte 
effect. At c > 0.2 M, the results are more ambiguous both in experiment and simulation. There 
is mixed experimental evidence as to whether the linear dependence of the RNA stability on Inc 
extends all the way to 1 M (cf. L8 and LIO in Figure IT]). Our simulations predict a substantial cur- 
vature in AAG vs. Inc in the range c > 0.2 M (Figure|7]), where the Debye-Hiickel approximation is 
likely to be less accurate. However, the melting profile of the MMTV PK obtained in simulations 
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at 1 M is in good agreement with experiment (Figure |6]). Due to insufficient experimental data, it 
is hard to establish the extent to which simulations and experiments disagree at c > 0.2 M. 

We find that, both for the hairpins and pseudoknots in monovalent salt solutions, the reduction 
in the magnitude of the backbone charge due to counterion condensation is given by Eq. (flOl) with 
b = 4.4 A. This result is particularly interesting since folded pseudoknots have a higher density 
of backbone packing than folded hairpins. In counterion condensation theory of rod-like poly- 
electrolytes, the parameter b is the mean axial distance per unit bare charge of the polyelectrolyte. 
Notably, distance 4.4 A agrees well with the estimates of the counterion condensation theory for b 
in single-stranded nucleic acids.— "^ Therefore, we propose that, in our simulations, b describes 
the geometry of the unfolded state, which is similarly flexible for hairpins and pseudoknots. Fur- 
ther work on this issue and the reduction of RNA charge in the presence of divalent counterions 
will be published elsewhere. 
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Table 1 : Enthalpies Mi, entropies AS and melting temperatures T^ of single-stranded stacks, de- 
rived in this work. Enthalpies of hydrogen bond formation in Watson-Crick base pairs are given in 
the last two rows. In the first column, the 5' to 3' direction is shown by an arrow. 
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Table 2: Temperature-dependent stacking parameters U^j used in Eq. (|3]). The values of h corre- 
spond to AGq = 0.6 kcal/mol in Eq. Q. The melting temperatures Tm are given in Tabled] In the 
first column, the 5' to 3' direction is shown by an arrow. 
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Table 3: Hydrogen bonds in the MMTV PK 
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Figures 
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Figure 1: Secondary structures of studied RNA. Hairpins L8 and LIO have a 5'-pppG, while the 
MMTV PK does not have a phosphate group at the 5 '-end. 
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Figure 2: Illustration of the structural parameters in Eq. ([3]). Sites P, S and B are shown in black, 
green and red, respectively. The indices refer to different nucleotides, r is the distance between 
bases Bi and B2 in angstroms, and 0i(Pi,Si,P2,S2) and 02(P3,S2,P2,Si) are the dihedral angles 
in radians. 
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Figure 3: (a) A sample distribution of stacking energies C/sT (Eq. ©) from simulations of coarse- 
grained dimers. All dimer configurations with t/sT < —k^T are counted as stacked in Eq. ^. 
(b) The free energy, AG, of stack formation in dimer (^) , calculated using Eq. ^ with AGq = 0. 
Open and closed symbols show AG for different U^j = —h + k^{T — Tjn)s, where T^ is the melting 
temperature of (^) in Table [Hand h, s vary. Red solid line shows AG = AH — TAS for AH and A5 
given in Table[Il Same AG is obtained in simulation with h = 5.98 kcal/mol and s = 5.30 (closed 
symbols). 
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Figure 4: Illustration of the structural parameters in Eq. (|7]). Angle definitions depend on the site 
(P, S or B) which forms a hydrogen bond. Examples show hydrogen bonding between sites B and 
S (a) and between sites P and S (b). In (a), r (SI, B4) is distance, 0i (S4, B4, SI) and 02 (P2, 
SI, B4) are angles, i// (P2, SI, B4, S4), i/Zi (SI, B4, S4, P5) and v/2 (B4, SI, P2, S2) are dihedral 
angles. In (b), r (SI, P4) is distance, 0i (S4, P4, SI) and 02 (P2, SI, P4) are angles, i// (P2, SI, P4, 
S4), V/i (SI, P4, S4, P5) and v/2 (P4, SI, P2, S2) are dihedral angles. All other designations are the 
same as in Figure |2] 
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Figure 5: Geometrical definition of the stability AG{T) of the folded state. The solid curve shows 
the total free energy of the system G(T), given by Eq. (fTSl ). The free energies of the folded and 
unfolded states, Gf{T) and Gu(r), are estimated from Eq. dH]) using T* = °C and 130 °C, 
respectively (dashed curves). 
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Figure 6: Measured^^ (black symbols) and computed (red curve) heat capacity C of the MMTV 
PK in 1 M Na+. The computed C{T) is shown with respect to the heat capacity of the unfolded 
state at 130 °C. AGq = 0.6 kcal/mol, U^^ = 2.43 kcal/mol. In (a), 2=1. In (b), Q = Q*{T), 
b = 4.4 A. The blue dashed curve in (b) shows C{T) from simulation using the solvent viscosity 
which is ten times larger than the r\ specified in Methods. It can be rigorously established that the 
thermodynamic properties must be independent of r/ . In accord with this expectation simulations 
at high and low values of 7] are in good agreement with each other despite the difficulty in obtaining 
adequate sampling for large r\ . 
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Figure 7: The stability AG of hairpins L8 and LIO as a function of salt concentration c, plotted as 
AAG(c) = AG(c) — AG(co) , where cq = 0. 11 M. Panels (a) and (b) show comparison of experiment 
(symbols) and simulation (curves) at different temperatures, assuming no hydrolysis of the 5'- 
pppG. Panels (c) and (d) illustrate the contribution of the 5'-pppG to AAG(c) at 37 °C for different 
levels of hydrolysis: no hydrolysis (solid), partial hydrolysis (dashed) and complete hydrolysis 
(dash-dotted). Simulation curves are for Q = Q*{T), b = 4.4 A. 
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Figure 8: Same as in Figure |6]but in 50 mM K+. 
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Figure 9: The fraction of folded stem 1 (black) and stem 2 (red) in the MMTV PK, as a function 
of salt concentration c. A stem is folded if five base pairs have formed (solid lines) or if one base 
pair has formed (dashed lines). AGq = 0.6 kcal/mol, U^^ = 2.43 kcal/mol, Q = Q*{T),b^ 4.4 A. 
In (a), r = 37 °C. In (b), T = 80 °C. 
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